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Abstract — The Progressive Edge Growth (PEG) algorithm is 
one of the most widely-used method for constructing finite length 
LDPC codes. In this paper we consider the PEG algorithm 
together with a scheduling distribution, which specifies the order 
in which edges are established in the graph. The goal is to 
find a scheduling distribution that yields "the best" performance 
in terms of decoding overhead, performance metric specific to 
erasure codes and widely used for upper-layer forward error 
correction (UL-FEC). We rigorously formulate this optimization 
problem, and we show that it can be addressed by using 
genetic optimization algorithms. We also exhibit PEG codes with 
optimized scheduling distribution, whose decoding overhead is 
less than half of the decoding overhead of their classical-PEG 
counterparts. 

Index Terms — LDPC codes, bipartite graph, PEG, UL-FEC, 
decoding overhead/inefficiency. 

I. Introduction 

Data loss recovery - for instance, for content distribution 
applications or for distributed storage systems - is widely 
addressed using erasure codes that operate at the transport 
or the application layer of the communication system. These 
codes, referred to as upper-layer (UL) codes, extend source 
data packets with repair (redundant) packets, which are used 
to recover the lost data at the receiver. They are generally 
proposed in conjunction with physical layer codes, in order to 
maximize the reliability of the transmission system, especially 
.in case of intermittent connectivity or deep fading of the 
signal for short periods. In such situations, the physical layer 
FEC fails and we can either ask for retransmission (only if 
a return channel exists, and penalizing in broadcast/multicast 
scenarios) or use UL-FEC. Hence, the use of UL-FEC codes 
is of critical importance in broadcast communication systems 
in general, and satellite communications in particular. 

Low Density Parity Check (LDPC) codes constitute a very 
broad class of FEC codes, distinguished by the fact that 
they are defined by sparse parity-check matrices, and can be 
iteratively decoded in linear time with respect to their block- 
length. Invented by Gallager in early 60's [1 1, but considered 
impractical to implement, these codes have been neglected for 
more that three decades, and "rediscovered" in the late 90's 
0. Nowadays, a large body of knowledge has been acquired 
(analysis, optimization, construction); LDPC codes are known 
to be capacity approaching codes for a large class of channels 
0, and became synonymous with modern coding. 

This work was supported by the French National Research Agency (ANR), 
grant No 2009 VERS 019 04 - ARSSO project. 



However, this capacity approaching property holds in the 
asymptotic limit of the code length, and codes optimized from 
this asymptotic perspective may suffer significant performance 
degradation at practical lengths. Actually, the asymptotic op- 
timization, performed by using density-evolution methods [4|, 
yields an irregularity profile, which specifies the distribution 
of node-degrees in the bipartite (Tanner) graph [5 1 associated 
with the code. It is assumed that the girtrj^ of the bipartite 
graph goes to infinity with the code-length. Hence, optimized 
irregularity profiles can be used to construct codes that are 
"long enough" (at least few thousand bits) to avoid short 
cycles, although they must be "short enough" to be practical. 

One of the most widely-used method for constructing 
finite length codes is the Progressive Edge Growth (PEG) 
algorithm J6j. It constructs bipartite graphs with large girth, 
by establishing edges progressively: the graph grows in an 
edge-by-edge manner, optimizing each local girth. There is 
an underlying edge order within the PEG, corresponding to 
the order in which edges are established in the graph. In 
general, edges are progressively established starting with those 
incident to symbol-nodes of degree-2 and ending with those 
incident to symbol-nodes of maximum degree. However, any 
other order with respect to the symbol-node degrees would 
also be possible. Besides, for a given symbol-node degree, 
edges can be established in a node-by-node manner (all edges 
incident to some symbol node are established before moving 
to the next symbol-node), or in a degree-by-degree manner (a 
first edge is established for each symbol-node, then a second 
edge is established for each symbol-node, and so on until all 
the symbol-nodes reach the given degree). Although this order 
may significantly impact the performance of the constructed 
code, it is rather difficult to formalize and has practically not 
been investigated in the literature. There are however several 
papers that aim to enhance the PEG construction by optimiz- 
ing some objective function, as for instance minimizing the 
number of cycles created Q, or minimizing the approximate 
cycle extrinsic (ACE) message degree [8|, [9|. 

In this paper we consider the PEG algorithm together with a 
scheduling distribution, which will be referred to as scheduled- 
PEG, or SPEG for short. Within the SPEG algorithm, symbol- 
nodes are divided into subsets, each subset containing symbol- 
nodes of same degree. Edges incident to the symbol-nodes of 
a subset are established in a degree-by-degree manner, before 
moving to the next subset. The scheduling distribution speci- 

1 Length of a shortest cycle. 



fies the fraction of nodes within each subset. Our purpose is to 
find a scheduling distribution that yields the best performance 
in terms of decoding overhead (performance metric widely 
used for UL-FEC). We rigorously formulate this optimization 
problem, and we show that it can be addressed by using genetic 
optimization algorithms. 

The paper is organized as follows. Section [TT] gives a brief 
overview of the basic theory and definitions related to LDPC 
codes, their iterative decoding, and the associated performance 
metrics over the BEC. The construction of finite length LDPC 
codes is addressed in Section [TTTJ The proposed Scheduled- 
PEG algorithm is also introduced in this section. Section [IV] 
focuses on the optimization of the Scheduled PEG algorithm 
and presents simulation results. Finally, Section IVl concludes 
the paper. 

II. LDPC Codes and Performance Metric over the 
Erasure Channel 

A. Binary and non-binary LDPC codes 

In this paper we consider both binary and non-binary LDPC 
codes defined over some finite field ¥ q , with q = 2 P IflOl . If 
p = 1, the code is binary. We fix, once for all, a vector space 
isomorphism: 

¥ q ^¥ p 2 (1) 

Elements of ¥ q will be called symbols. We say that the binary 
sequence (xo, .., x p -i) G ¥ 2 is the binary image of the symbol 
X G ¥ q , iff they correspond to each other by the above 
isomorphism. 

An LDPC code is a linear code defined by a sparse 
parity-check matrix H G M mj „(F g ). Alternatively, it can be 
represented by a bipartite (Tanner) graph@ [5|, containing n 
symbol-nodes and m constraint-nodes associated respectively 
with the n columns and rn rows of H. A symbol-node and 
a constraint-node are connected by an edge if and only if the 
corresponding entry of H is non-zero, in which case the edge 
is assumed to be "labeled" by the non-zero entry. 

Symbol-nodes take values in ¥ q , and a constraint-node 
is said to be verified if the linear combination of neighbor 
symbols (with coefficients given by the corresponding edge 
labels) is equal to zero. A non-binary word (Xx, . . . ,X n ) is 
a codeword if it verifies all the constraint nodes of the graph. 

The degree of a node is by definition the number of edges 
incident to that node (number of non-zero entries on the 
corresponding row/column of H). A code is called (d s ,d c )- 
regular if all symbol-nodes are of degree d s and all constraint- 
nodes are of degree d c ; otherwise it is called irregular. 

Let Ad and Yd denote respectively the fractions of symbol 
and constraint nodes of degree-d Let also A^ and pd be the 
fractions of edges connected respectively to symbol and con- 
straint nodes of degree-d. The degree distribution polynomials, 



By abusing language, throughout the paper, the term "code" will be used 
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from the node and the edge perspective, are defined by: 

A(x) = ^2 A d X d , r(x) = ^r (i a; d (node-persp.) 

d d 

A(x) = ^ A d X d_1 , p(x) — ^2 PdX^ 1 (edge-persp.) 

d d 

The designed code rate, denoted by r, is by definition: 

jjp(x)dx _ A'(l) 

J 1 \(x)dx r'(i) 

If the parity-check matrix is of rank m, then the (non-binary) 
code dimension is equal to k = n — m, and r is equal to the 
code rate, that is r = — . 

We also denote by K and N the binary code dimension 
and the binary code length, respectively. Hence, K = kp and 
TV = np. 

B. Iterative erasure decoding 

For binary LDPC codes over the BEC, the belief- 
propagation (BP) decoding translates into a simple technique 
of recovering the erased bits, by iteratively searching for 
check-nodes with only one erased neighbor bit-node iTTTl . 
Indeed, if a check-node is connected to only one erased bit- 
node, the value of the latter can be recovered as the XOR of all 
the other neighbor bit-nodes. This value is then injected into 
the decoder, and the decoding continues by searching for some 
other check-node having a single erased neighbor bit-node. We 
remark that the erasure decoding can be performed on-the-fly: 
decoding starts as soon as the first bit is received and each 
new received bit is injected on-the-fly into the decoder. The 
decoding will stop by itself if all the bits have been recovered, 
or when it "gets stuck" because any check-node is connected 
to at least two erased bit-nodes (such a configuration is called 
a stopping set). 

The above considerations can also be generalized in the case 
of non-binary codes. It is important to note that we consider 
non-binary LDPC codes over a binary erasure channel, which 
means that the channel erases bits from the binary image 
of the transmitted codeword. Hence, a coded symbol can be 
completely erased (all the bits of its binary image are erased), 
completely received (no bit of its binary image is erased), 
or partially erased/received (some bits of its binary image 
are erased, some others are received). At the receiver part, 
the received bits are used to (partially) reconstruct the corre- 
sponding symbols of the transmitted codeword. Thus, for each 
symbol node we can determine the set of eligible symbols, 
i.e. symbols whose binary images match the received bits. 
Such a set contains only one symbol (namely the transmitted 
one) if the corresponding symbol-node is completely received. 
These sets are then iteratively updated, according to the linear 
constraints between symbol-nodes 11121 . Alternatively, non- 
binary LDPC codes can be decoded by using their extended 
binary image fl3l . 



C. Performance metrics for finite-length codes 

A performance metric that is often associated with on-the- 
fiy decoding is the decoding inefficiency, defined as the ratio 
between the number of received bits when decoding completes 
and the number of information bits lfl4l . More precisely, we 
assume that the encoded bit-stream is permuted according 
to some random permutation ir. The permuted bit-stream is 
sequentially delivered to the decoder, which performs erasure 
decoding on- the-fl>§ each bit is injected into the decoder in 
the appropriate position and erasure decoding is performed 
until either decoding completes or it gets stuck. If decoding 
gets stuck, we inject the next bit from the permuted bit-stream. 
We denote by k v the number of bits from the permuted bit- 
stream that have been injected into the decoder when the 
erasure decoding completes; the value — k is referred to 
as reception overhead. The decoding inefficiency is defined as 
p(ir) = % . It is a random variable, whose value depends 
on the (random) permutation 7r. Note also that p(n) £ [l, 
where r is the code rate. The average decoding inefficiency is 
defined as the expected value of p; that is: 

M = E [fi\ = -j 

7T 

Note that /I = 1 if and only if the code is MDS (its minimum 
distance is equal to n — k + 1). More accurate statistics about 
the decoding inefficiency are provided by the probability of 
decoding failure, defined as the complementary cumulative 
distribution function (CCDF) of p: 



F(x) = Pr[/i(7r) > x], x £ 



1 

1,- 

r 



Hence, F(x) is the probability of decoding failure assuming 
that the number of bits received from the channel is equal to 
kx, or equivalently, the reception overhead is equal to k(x— 1). 
Indeed, the kx bits received from the channel are the first kx 
bits of the encoded bit-stream permuted by some permutation 
7r. Hence, a decoding failure occurs if and only if p(ir) > x. 
From the above definition, it also follows that: 



F(x) dx — p — 1 



Therefore, if C\ and C2 are two codes such that p,(C\) < p{C2), 
then C\ presents a better performance in the waterfall region 
of F, but this might happen at the expense of a higher error 
flooill 

Finally, we note that the probability of decoding failure can 
also be expressed in relation with the fraction of erased bits 
(rather than the fraction of received bits); in this case it will 
be referred to as Frame Error Rate (FER). The Bit Error Rate 
(BER) will denote the probability of a bit being erased (after 

3 Such a random reception corresponds to a randomly interleaved erasure 
channel, which allows us to dispense with a specific loss model. 

4 The waterfall is the region in which the failure probability decreases very 
quickly as x increases. However, there might be a point after which the curve 
does not fall as quickly as before, in other words, there is a region in which 
performance flattens. This region is called the error floor region. 



the decoding process), assuming that a certain fraction of bits 
have been received. 

D. Asymptotic performance 

Unlike the finite-length performance, the asymptotic perfor- 
mance does not refer to the performance of a given code, but 
to the performance of a given family or ensemble of codes. 
Such an ensemble contains arbitrary-length codes that share 
the same properties in terms of distributions of node-degrees 
in the associated bipartite graph. 

Let E(X, p) denote the ensemble of LDPC codes of arbitrary 
length n > 0, with edge perspective degree-distributions 
polynomials A and p. When n goes to infinity, (almost) all the 
codes behave alike, and they exhibit a threshold phenomenon, 
separating the region where reliable transmission is possible 
from that where it is not Assume that an arbitrary code 
C n £ E(X, p), of length n, is used over the BEC, and let 
p e denote the channel erasure probability. The threshold of 
the ensemble E(X, p) is defined as the supremum value of p e 
(i.e. the worst channel condition) that allows transmission with 
an arbitrary small error probability, assuming that n goes to 
infinity. Let us denote this threshold value by pth(X,p). The 
threshold value is necessarily less than the channel capacity, 
that is pth(A, p) < 1 — r, where r is the (asymptotic) code 
rate of the ensemble E(X, p). Roughly speaking, this means 
that if an encoded sequence of length n is transmitted over 
the channel, it can be successfully decoded iff the fraction of 
erased bits is less than p e < p±(X,p), with p e —> p t h(X,p) 
as 71 — y +00. It is assumed here that the girth of the graph 
goes to infinity with n, which actually happens for almost all 
the codes in E(X, p). It follows that the decoding inefficiency, 
which can be expressed as ( 1 ~^ N = 1 ~ Pe , also goes to a 
threshold value: 

1 -Pth 

Mth = , 



which will be referred to as inefficiency threshold. 

Given an ensemble E(X,p), its threshold value can be 
efficiently computed by tracking the fraction of erased mes- 
sages passed during the belief propagation decoding. This 
method is called density evolution (the name is due to the 
fact that over more general channels, we have to track the 
message densities). For more details on density evolution we 
refer to [4] for binary codes, and [15 ] and IflZl for non- 
binary LDPC codes. The introduction of irregular codes, as 
well as the asymptotic optimization based on the density 
evolution method, made possible the construction of capacity 
approaching ensembles of LDPC codes 0. 

III. Finite length LDPC codes construction 

As discussed in the above section, the asymptotic threshold 
can be approached by long codes, which do not contain short 
cycles. Short cycles may also harm the performance of short 
(finite-length) codes, as they can result in short stopping sets. 
Hence, the PEG algorithm has been proposed, and is widely 
used, for constructing bipartite graphs with large girth, in a 
best effort sense, by progressively establishing edges between 
symbol and check nodes in an edge-by-edge manner [6|. 



A. Progressive Edge Growth algorithm 

A bipartite (Tanner) graph is denoted as (S, C, E), where 
S — {si, S2, s n } is the set of symbol-nodes, C — 
{ci, C2, c m } is the set of constraint nodes and E C S x C 
is the set of edges. An edge (c,-, Sj) 6 S corresponds to a 
non-zero entry hij of the parity check matrix H. We also 
denote by D$ the "target sequence" of symbol-node degrees, 
which is assumed to be sorted in non-decreasing order: 

D s = {d sl ,d S2 , —,d Sn \d Sl < d S2 < ... < d Sn } 

where d Sj is the degree of symbol node Sj. 

When the PEG algorithm starts, the set of edges is empty, 
E = 0. Edges will be progressively added to E, as explained 
shortly. Given a symbol node Sj, we denote by C_ s . the set 
of constraint-nodes whose distance to Sj is maximum, in the 
current settings; that is, given the current set of edges. The 
distance between two nodes is the length of the shortest path 
connecting them. If there is no path between Sj and some 
constraint node cj, the distance between them is set to +00. 
Hence, if E s . = 0, the distance from Sj to any constraint-node 
is +00 and C„ = C, the set of all the constraint-nodes. If 
E Sj 7^ 0, C_ Sj can be determined by expanding a subgraph 
from symbol node sj up to the maximal depth (see (6)). 

Finally, we use Cj < — {Qsj I mm deg j to denote a random 
constraint node Cj £ C_ s . , having the lowest degree (given the 
current set of edges of the graph). The PEG algorithm can be 
summarized as follows: 



Algorithm 1 Progressive Edge Growth algorithm 

for j ■ = 1 to n do 
for k = 1 to d s do 

Determine C_ s ., given the current E; 

i — {c_ s . I min degj; 
Add edge (sj,Ci) to E; 
end for 
end for 



We note that edges are established in a node-by-node man- 
ner, meaning that all edges incident to some symbol-node are 
established before moving to the next symbol-node. We have 
further assumed that symbol nodes are sorted in increasing 
order with respect to their degrees. Finally, we observe that the 
constraint-node degree distribution of the constructed Tanner 
graph is almost uniform, i.e. all constraint nodes have only 
one or at most two consecutive degrees [6|. 

Let (?peg (n, m, D s ) denote the ensemble of all Tanner 
graphs constructed by using the PEG algorithm. The average 
inefficiency ratio over all graphs in Qpeg {n, m, D s ) is defined 
as: 

Mpeg (n,m,D s ) = E[p(C) | C 6 Gpec (n,m,D s )] 

When parameters (n,m,D s ) are implied, it will be simply 
denoted by /2peg- The corresponding standard deviation is 
denoted by ctpeg- 



B. Modified Progressive Edge Growth algorithm 

The PEG graphs have large girth with respect to random 
graphs. Consequently, PEG graphs have low error floor in 
comparison with random graphs. However, the error floor 
can be further lowered, by using a modification of the PEG 
algorithm, called ModPEG. 

Let Sd denote the set of symbol nodes of degree d, where 
1 < d < d max and d max denotes the maximum symbol-node 
degree. Let rid denote number of symbol nodes within Sd, 
hence ^ d rid = n. 

Within the ModPEG algorithm, for each Sd, edges are estab- 
lished in a degree-by-degree manner: a first edge is established 
for each symbol-node, then a second edge is established for 
each symbol-node, and so on until all the symbol-nodes in Sd 
reach the required degree d. 



Algorithm 2 Modified Progressive Edge Growth algorithm 

for d = 1 to (i m ax do 
for k = 1 to d do 

for Sj e Sd do 

Determine C_ s . , given the current E; 

Ci < — \p- Sj I mm degj; 
Add edge (sj,Cj) to E; 
end for 
end for 
end for 



Let Qmpeg {n,m, D s ) denote the ensemble of all Tanner 
graphs constructed by using the ModPEG algorithm. The 
average inefficiency ratio over all graphs in ^mpeg (n>, m, D s ) 
is defined as: 

MModPEG (n,m,D s ) = E[/2(C) | C e £ M peg {n,m,D„)] 

When parameters (n,m,D s ) are implied, it will be simply 
denoted by /2mo<ipeg- The corresponding standard deviation is 
denoted by CT Mo dPEG- 

Example III.l. We consider irregular LDPC codes of rate 
1/2, defined over the F2, F4, Fg, Fi6- The node-perspective 
symbol-node degree distributions and the asymptotic perfor- 
mance are shown in Table |T] (the constraint-node degree dis- 
tributions are considered to be almost uniform). These codes 
have been optimized by using density evolution methods. The 
binary code can be found in [16], while non-binary codes have 
been optimized within this work. For each degree distribution, 
100 Tanner graphs have been constructed, by both PEG 
and ModPEG algorithms, for codes with binary dimension 
K = 5000. The corresponding average inefficiency ratios are 
also shown in Table U The details of these inefficiency ratios 
are shown in Figure Q] where it can be seen that using « 20 
graphs is sufficient for obtaining good estimates of flpEC and 
MModPEG values. 

We also observe that, in all the four cases, the average 
inefficiency ratios /2peg and /iModPEG are much larger than /Xth- 



Table I: Optimized irregular LDPC codes, with rate 1/2 



Alphabet 


Degree distributions 


Pth 


Mth 


MPEG 


CT PEG 


MModPEG 


^ModPEG 


F 2 


AM — n 5489t 2 -i- n 2 c in c iT' i -i- n irost 7 ■+■ n osost™ 

F(x) = 0.6609a; 8 + 0.3391a; 9 


0.4955 


1.009 


1.0829 


1.475e-04 


1.0876 


1.202e-04 


F 4 


A(a') = 0.7140x^ + 0.2173a; 4 + 0.0687a; i:J 
F(x) = 0.7586a; 6 + 0.2414a; 7 


0.4926 


1.0148 


1.0604 


2.903e-04 


1.0685 


2.117e-04 


F 8 


A(x) = 0.7857Z 2 + 0.05292^ + 0.1153x b + 0.0461a; 12 
T(x) = 0.2797a' 5 + 0.7203a; 6 


0.4931 


1.0138 


1.0655 


3.188e-04 


1.0726 


2.8e-04 


Fie 


A(x) = 0.8460a; 2 + 0.1056a: b + 0.0252a; 8 + 0.0232a; i^, 
F(x) = 0.3221a; 5 + 0.6779a; 6 


0.4945 


1.011 


1.0706 


6.81e-04 


1.0834 


3.536e-04 



1 .0882 
1.088 
1 .0878 
1.0876 
1 .0874 
1 .0872 

1 .0836 
1 .0833 
1.083 
1 .0827 ' 
1 .0824 



^ModPEG 
"ModPEG 



MPEG 
°PEG 



10 20 30 40 50 60 70 80 90 
Code index i 

(a) F 2 




10 20 30 40 50 60 70 
Code index i 

(c) F 8 



1.0696 
1.0692 
1 .0688 
1 .0684 
1.068 

1.062 
1.0614 
1.0608 
1 .0602 
1.0596 



MlwIodPEG 
"ModPEG 



• b i am 



]'i;f'-n:i-r-,',:vfi i .-.;ii(i l ; 



L'(C,) 

MPEG 
°PEG 



10 20 30 40 50 60 70 
Code index i 

(b) F 4 




10 20 30 40 50 60 70 80 90 
Code index i 

(d) F i6 



Figure 1 : Average inefficiency ratios of ensemble Tanner graphs constructed by using optimized Scheduled PEG algorithm in 
comparison with original PEG and modified PEG algorithms for K = 5000 information bits 



Also, the average inefficiency ratios corresponding to Mod- 
PEG are larger than those corresponding to the original PEG. 
The reason is that the ModPEG algorithm improves the error 
floor region but also worsens the waterfall region, as it can be 
seen in Figure [2] there are eight BER curves, corresponding 
to eight LDPC codes drawn from the corresponding PEG and 
ModPEG ensembles of graphs. 

C. Scheduled Progressive Edge Growth algorithm 

Both PEG and ModPEG algorithms construct Tanner graph 
with large girth and low error floor. For optimized irregular 
LDPC codes, however, the average inefficiency of finite length 
graphs is far away from the predicted (asymptotical) threshold 
(even for codes with K = 5000 and N = 10000 bits, cf. 
Example [ill. 1 b . 



The Scheduled Progressive Edge Growth (SPEG) algorithm, 
proposed in this section, aims to improve the average ineffi- 
ciency of irregular LDPC codes. 

We fix an integer T > 1. We consider a collection of disjoint 
symbol-nodes subsets sjp C S, indexed by t € {1, . . . ,T} 
and d G {1, . . . d max }, such that: 

• S d C Sd (S d contains only symbol-nodes of degree d) 

d=l t=l 

We denote by n d the number of symbol-nodes in S d . It fol- 

t— 1 dmax t—i 

lows that Sd = uf^S^, rid — J]] n d> and n — 

i=0 d=l i=0 

The scheduled PEG algorithm works as follows: at time t, it 
connects progressively symbol-nodes within the subsets S^, 
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Figure 2: BER performance of Tanner graphs constructed by 
using PEG and ModPEG algorithms, for irregular LDPC codes 
shown in Table [I] with with K = 5000 bits. 



for d = 1, . . . , <i m ax- For each subset S d \ edges are established 
in a degree-by-degree manner. 



Algorithm 3 Scheduled Progressive Edge Growth algorithm 

for t = 1 to T do 
for d - 1 to d max do 
for k = 1 to d do 



for Sj G S d 



(*) 



do 



Determine C_ Sj , given the current E\ 
[C S] | min degj; 



Add edge (sj,Ci) to E; 
end for 
end for 
end for 
end for 



the ensemble of 



We denote by £speg \ n, m, 
all Tanner graphs constructed by using the SPEG algo- 
rithm. The average inefficiency ratio over all graphs in 
is defined as: 



Mspeg (n, m, {^i) = E P-( c ) I c e ^speg (n,m, {s^M 

When parameters (n,m, {^d*'}) are implied, it will be sim- 
ply denoted by /^speg- The corresponding standard deviation 
is denoted by ctspeg- 

In the simple case of T = 1, we have S d = Sd, and 
the SPEG algorithm is equivalent to the ModPEG algorithm. 
On the other hand, if T = n, each subset contains one 
single symbol-node, and the SPEG algorithm is equivalent to 
the original PEG algorithm. For a general T value, the SPEG 
algorithm is in between these two extreme cases. It allows 
to explore the ensemble of LDPC codes with fixed code- 
length and degree distributions and, as explained shortly, is 



aimed at finding codes with very small average inefficiency. 
As observed in Section IH-CI a lower average inefficiency 
corresponds to better performance of the code in the water- 
fall region. However, generally there is a tradeoff between 
waterfall and error floor regions. Hence, to prevent excessive 
degradation in the error floor region, the edges within each 
S d t] subset are established in a degree-by-degree manner. 

The use of the SPEG algorithm can potentially improve the 
average inefficiency of the constructed LDPC code, as shown 
in the following example. 

Example III.2. We consider non-binary LDPC codes with 
rate 1/2, defined over ¥%q, whose node-perspective degree 
distribution polynomials are given in Table [I] The asymptotic 
threshold is equal to p t h = 0.4945, corresponding to an 
asymptotic inefficiency ^ — 1.011. 

We want to construct a code with binary dimension K = 
5000. Hence, the Tanner graph contains n = 2500 symbol- 
nodes and m — 1250 constraint-nodes. We use the SPEG 
algorithm with T = 3, such that = n d t2 \ for any 1 < 
ti,t2 < T. This means that each Sd is partitioned into three 
subsets S d *\t = 1,2,3, of same cardinality. 

As in Example Mill we estimated the average inefficiency 
ratio of the ensemble Sspeg (n, m, ^S d ^ |J by simulating 100 
SPEG-codes. We obtained /2speg = 1.0536, which has to be 
compared with /ipEG = 1-0706 and /iModPEG = 1-0834 from 
Table U 

This example shows that an appropriate choice of the 
subsets |<$jjj may improve the average inefficiency of the 
constructed code. In the following section we propose a 
method that allows to optimize this choice, by minimizing the 
average inefficiency of the corresponding ensemble of SPEG- 
codes. 

IV. Optimized Scheduled-PEG construction 
A. Optimization algorithm 

The main idea behind the SPEG algorithm is that dif- 
ferent choices of the scheduling subsets ^S d ^ might lead 
to codes with different performance. Our purpose is to find 
scheduling subsets that minimize the average inefficiency 
of the corresponding ensemble of SPEG-codes. In order to 
properly formulate this optimization problem, we have to 
take into consideration only the "profile" of the scheduling 
subsets, which consists of the fractions of nodes within each 
subset. Precisely, let f d denote the fraction of symbol-nodes 
contained in ST , hence: 
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Figure 3: Flowchart of the differential evolution optimization 
algorithm 



satisfying the condi- 



A family of parameters j ffp } x < 4 < , 

1 < d < dm,x 

tion (2) before is called scheduling distribution. 

When parameters (n, m, are fixed, we consider 

that the SPEG algorithm starts by randomly choosing a 
family of scheduling subsets |<S^T, according to the given 
scheduling distribution, and then it constructs a (random) 
Tanner graph as explained in the above section. We denote by 
!?speg (n,m, |/i jj tne corresponding ensemble of SPEG 
Tanner graphs, and we define its average inefficiency by: 



C G Sspeg ( n, m 



,{/.'«*»}) 



When parameters n and m are fixed, it will be simply denoted 
by /Ispeg f{/d* j)' This represents the objective function of 
our optimization problem. Although it cannot be computed an- 
alytically, /Ispeg f{/i j ) can ^ e efficiently estimated by sim- 
ulating a finite number of codes C € Sspeg (ji, m, \f<Pf)- 

Since our objective function cannot be expressed analy- 
tically, we address the optimization problem by using generic 
population-based metaheuristic optimization algorithms. More 
precisely, the differential evolution algorithm [17| is used, 
which optimizes /Ispeg f{/i* j) iteratively trying to im- 
prove the best current solution |/^|. The flowchart of the 
optimization algorithm is illustrated in Figure [3] 

The algorithm maintains a population of candidate solutions, 
which is randomly initialized. At each iteration, the current 
population is evaluated, such as to find the best current 
solution. Then, a population of new candidate solutions is 
generated by combining existing candidates (mutation and re- 
combination), and then keeping whichever candidate solution 
has the best score. In this way the objective function is treated 
as a black box that provides a measure of quality of the 
candidate solutions. 

B. Simulation results 

We consider four ensembles of irregular LDPC codes 
of rate 1/2, defined over the F2, F4, Fg, Fig, whose node- 
perspective symbol-degree distributions are shown in Table U 
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Figure 4: Average inefficiency ratios of the ensembles 
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(the constraint-node degree distributions are considered to be 
almost uniform). 

We fixed the parameter T = 3 and, for each of the 
above ensembles, we used the differential evolution algorithm 
to find a scheduling distribution that minimizes the average 
inefficiency of the corresponding SPEG codes. We consid- 
ered codes with binary dimension K = 5000; Hence, the 
binary code-length is N = 10000 and the non-binary code- 
length (number of symbol-nodes of the graph) is n = N/p, 
where p £ {1, 2, 3, 4}. The optimized scheduling distributions 
{/d*'} an d tne corresponding inefficiencies /2speg f{/i* j) 
are shown in Table [II] For comparison purposes, we have 
also displayed in Table [TT] the average inefficiencies of the 
corresponding PEG, ModPEG, and random^ ensembles. We 
can see that the SPEG algorithm significantly improves the 

5 Random codes with given n, m, and node-degree distribution polynomials. 



Table II: Optimized scheduling distributions and corresponding inefficiency ratios, for irregular LDPC codes of rate 1 /2, defined 
over Fa, F4, Fg and Fie, with binary dimension K = 5000 
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average inefficiency ratios in comparison with those of the 
PEG and ModPEG algorithms. 

Moreover, we constructed several finite length codes over 
Fi6, using the same scheduling distribution that has been 
optimized for K = 5000. The average inefficiency of these 
codes is plotted in Figure |4] It can be observed that they 
significantly outperform codes constructed by the original PEG 
algorithm. 

For K — 5000, Figure [5] displays the bit error rates of 
one PEG code, one ModPEG code, and one SPEG code 
with optimized scheduling distribution (these codes are chosen 
among the best codes of the corresponding ensembles). We 
can see that the SPEG algorithm significantly improves the 
waterfall region, at the expense of a slightly higher error floor. 

V. Conclusion 

The proposed Scheduled-PEG algorithm allows the en- 
hancement of the classical PEG algorithm, by the introduction 
of a scheduling distribution that specifies the order in which 
edges are established in the graph. The scheduling distribution 
provides a way for exploring the ensemble of LDPC codes 
with fixed code-length and degree distributions, and is aimed 
at finding codes with very small average inefficiency. 

We showed that the SPEG algorithm can be successfully 
combined with genetic optimization algorithms, which signif- 
icantly improves the average inefficiency of the constructed 
LDPC codes over the classical-PEG construction. In terms of 
error rate curves, this translates into a significant improvement 
of the waterfall region. 

Finally, we remark that the optimization of the schedul- 
ing distribution makes use of the specific channel model 
(through the use of the decoding inefficiency). Hence, LDPC 
codes constructed by using the SPEG algorithm together with 
an optimized scheduling distribution are channel dependent. 
However, the proposed algorithm could be generalized for 
more general channel models (e.g. by optimizing with respect 
to a target FER, or to the area under the FER curve.). 
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